Now that we have our working gateway, we will look at how we can turn our notebook into something more appropriate for infrastucture use. In this section we are primarily conerned with offline and/or batch notebook execution.

Identity and access management

We will assume that our offline notebooks will still be interacting with the Agave Platform. As such, we need to address the questions about account, token, and session management to ensure that the notebooks and their data remain secure for each of our runs.

In the previous notebooks, we worked interactively under a single training account. We generating a new set of client keys for that account, and used them to create and refresh our access token on demand. In a batch environment, we still need to obtain an access token to interact with the Agave Platform, but how we do that will vary based on how your gateway is handling user identity and access.

A full discussion on the different ways to handle IAM in gateway environments is beyond the scope of this tutorial. Here we will briefly touch on the two most common methods used in academic science gateways today: community accounts and user proxying. With each method we highlight the special considerations, with respect to token and notebook management, involved in running our training notebooks under that IAM model.

For a deeper discussino of IAM in science gateways, please see the Science Gateways Community Institute website.

Community accounts

Community (or robot) accounts are special, shared accounts usually allocated to gateways who need to act on behalf of multiple users. These accounts generally are no different than standard user accounts, save for some additional oversight in exchange for the relaxed usage restrictions afforded to them by the issuing organization.

Science gateways generally use a community account when they want to ochestrate activity for their user community without requiring them to have accounts on the unerlying system. A gateway, for example, might obtain a bulk allocation of compute and storage resources, then use that to provie all the computational an storage capacity for its end users.

While using a single account to interact with backend resources does simplify some things, it also pushes the responsibility of properly securing and accounting for the actions of each end user onto the gateway.

Token management

The presence of a single account makes the use of single managed token used by all notebooks attractive. However, this introduces security risks if the auth token or refresh token is compromised. It also creates a situation where jobs running on different hosts could unnecessarily fail due to race conditions that pop up when synchronizing tokens across hosts after a refresh.

The recommended apporach is to generate a new set of Agave client keys for each notebook run and use that client to request a fresh token for each job. The job, then, can refresh the token as needed without impacting any other jobs or other infrastructure services. Upon completion of the job, the tokens should be explicitly revoked, and the Clients API invalidated.

Isolating tokens and client keys in this manner significantly reduces the risk of writing keys to disk or otherwise exposing them to the application logs, output, or provenance records.

Notebook management

It is up to the gateway to ensure the isolation of the notebook process runtime, temp files, and assets for each invocation. Because the notebooks run under a common account, standard Linux security is not available to isolate notebooks run by different users. It is up to you to create a proper namespace for users, their job data, and prevent data or process leaks between them.

If your gateway is providing access a predefined set of codes that you control, then you can take the necessary precautions to prevent any bleeding of data access between apps and jobs. Namespacing job runs and sanitizing input and output file paths, parameters, and network requests will essentially isolate each job and keep things orderly.

If your gateway allows user-defined codes, then it is much more likely that users will do things to violate whatever conventions you would otherwise use to reasonably secure a job. The easiest approach, if possible, is simply to isolate each run in a container or VM, mount only enough of the file system to expose user data necessary for the run, disable local network communication, and restict outbound communication to known ports and IP ranges.

User account proxies

Another common approach is for gateways to simply operate on behalf of their end users using their end user's accounts. In this model, every action the gateway takes is done as the end user.

Token management

The method the gateway uses to obtain access to the user's resources varies, and the security implications are obvious. In this situation, it is unlikely you gateway will have access to the client's credentials. Thus, while access token refresh may be a possibility, so too are race conditions between jobs running on different hosts. In general, if your notebooks have long-running tasks, it is better to break the notebook up discrete parts that Agave can manage for you, handling the refresh and token management for individual jobs rather than introducing race conditions or potentially exposing the user's credentials unnecessarily.

Notebook management

When running as the end-user, using theirr own compute and stoage resources, you avoid many of the complexities faced by community accounts to secure notebook executions. Care should still be taken to properly namespace job runs by user and process to prevent unintended bleed when users share data.


In [ ]:

Parameters and inputs

Searching and setting

To streamline the notebook for flexible execution, we need to parameterize the notebook inputs. To support multiple invocation methods, we will use a chained lookup mechanism to read these parameters in our notebook. The lookup order will be:

  1. Environment variables
  2. Local config file
  3. $HOME config file

Dependency management

As a gateway, you can provide a superior user experience by controling your runtime environment. This can be challenging when leveraging heterogeneous shared resouces. Whenever possible, we recommend leveraging Singularity to isolate pure compute codes. These give you a high degree of portability and, while still young, the project is currently gaining good traction across the academic scientific computing community.

If your application requires more comles interactions or requires bidirectional communication to run, then consider utilizing Docker. It provides you the ability to secure your application as well as allow you to better manage you network and port allocation when running multiple instances of the same application while isolating traffic to approved sources.

In the event you cannot leverage either of these container technologies, it is prefereable to use a mature dependency management system like lmod or modules to properly define and configure your environment on a per-job basis. Generally speaking, it is best if your notebooks do not have to think about configuring their environment. Leave the application logic in the notebook. Leave the runtime logic to the infrastructure.

Execution tips

The following are some tips that may help you as you implement your backend infrastucture to run your notebooks.

Multiuser environments

  • Namespace the data whenever possible.

    Prefer $WORK/<project>/<username>/<uuid> over $WORK/<uuid> or <username>/<job name>

  • If possible, use existing system accounts to restrict access to user accounts
  • If using containers, mount just enough data to get the job done and no more.
  • Think about networking and cross-container communication. Don't just run everything with the default network.
  • Network encryption, please.

Handling sensitive data

  1. Use delegated keys when possible. Limit scope and accessability.
  2. Leverage short-term credentials. Limit to lifetime of the job. Expire aggressively when job ends.
  3. Memory over file. Pull in at runtime when possible.
  4. Think about the paper trail. Passing in as environment variable is not much better than writing to a config file when the invocation command is stored on disk.

Debugging

Debugging your batch application can be done interactively by launching your notebook in a Jupyter server on the target resource rather than as a batch application.

If your target execution resources does not have publicly accessible compute nodes, you can launch a sidekick process with your job to tunnel out of the compute resources. There are a common ways to do do this.

SSH tunnel

Establishing a SSH tunnel to the head node of your execution system allows you to communicate directly with your notebook by to the head node, or another outbound communication, you can

Parameters and inputs

Searching and setting

To streamline the notebook for flexible execution, we need to parameterize the notebook inputs. To support multiple invocation methods, we will use a chained lookup mechanism to read these parameters in our notebook. The lookup order will be:

  1. Environment variables
  2. Local config file
  3. $HOME config file

In [ ]:
import runagavecmd as r

In [31]:
from confypy import Config
from confypy import Location

config = Config(chain=True)
config.locations = [
Location.from_env_path('FUNWAVE_SETTINGS'), # 5
Location.from_path('/data/foo.json'), # 4 ^
Location.from_path('/data/foo.yaml'), # 3 ^
Location.from_env_keys(['FUNWAVE_AMP', 'FUNWAVE', 'BAZ']);

AMP = 3.0
Xc = 250.0
Yc = 50.0
WID = 20.0


/bin/sh: 1: foo: not found

In [19]:
!urlencode '{"associationIds":"1021257183143390745-242ac11b-0001-007"}'


%7B%22associationIds%22%3A%221021257183143390745-242ac11b-0001-007%22%7D

In [38]:
!apps-search name.like=*cloud*


cloud-runner-0.1.0u1
cloud-runner-0.1.0

In [40]:
!apps-search name.like=*singularity*


singularity-runner-2.4.3
jfonner-run-singularity-4.2.3

In [42]:
!apps-search name.like=*fork*


stevenrbrandt-jetstream-fork-1.0
shelob-fork-app-stevenrbrandt-1.0
mike-fork-app-stevenrbrandt-1.0
stevenrbrandt-fork-1.0
fork-1.0
mrojas-clone2-jfonner-fork-1.0
jfonner-fork-1.0

In [ ]:


In [ ]:


In [ ]:
!metadata-list -V -Q '{"associationIds":"1021257183143390745-242ac11b-0001-007"}'